Playground

Musings and ramblings through the world of a data scientist

Game of Craps - A Mathematical Exploration (Analytical and Simulation)


Game of Craps

This report details the analysis of the Game of Craps and answering a few interesting questions about the game. The rules of the game may be found here. A simple case is considered where one of the dice is biased such that it only turn values from 2 to 5.

First an analytical solution is provided, leveraging the tools of probability theory. Then a simulation of the game is also carried out to find approximate answers to the questions. A summary of the results is then presented, comparing the analytical and simulation results and the appeal of either approach for solving the problem

In [1]:
# imports

import numpy as np
import scipy as sp

import matplotlib.pyplot as plt
import scipy.stats as st
plt.rcParams['font.size'] = 13
%matplotlib inline

Part 1: Analytical Method (Probability Theory)

Let $X_1$ and $X_2$ be the random variable (r.v) representing the outcome of rolling the dice.

Furthermore, we let $X_1$ be the unbalance die:

$$ \implies X_1 \sim U \{2, 5\} \\ X_2 \sim U \{1, 6\} $$

where $U\{a, b\}$ is used here to represent the discrete uniform distribution supported on integers from $a$ to $b$

Then let sum of the dice roll be represented as the r.v $Y = X_1 + X_2$

Game rule summary:

  • roll the dice
  • if $Y \in \{ 7, 11\}$ this is a win, game over
  • if $Y \in \{ 2, 3, 12\}$ this is a loss, game over
  • Otherwise, record $Y$ and keep playing (rolling dice) until one of two things happen;
    • roll a 7 and lose the game
    • roll a value equal to the recorded Y value and win the game

Q1

For each number between 2 and 12, what is the probability of rolling the dice so that they sum to that number?

Derivation of distribution of dice sum

We could calculate this for each of the numbers separately but deriving the distribution of the sum would allow us to proceed more generally.

First the distribution of $Y$ is generalized as follows:

The sum of the roll yields a number $y$ if the first roll yields some value $x$ and the second die yields $y-x$. Since the outcome of individual die is independent we simply multiply the probabilities and sum over all possible first roll value $x$ (by the countable additivity axiom). This is expressed mathematically as follows:

$$ P_y(y) = P_{X_1 + X_2}(y) = \sum _{x_1} P_{X_1}(x_1) P_{X_2}(y-x_1) \ $$

Next we determine the bounds on the summation:

The following bounds apply for the value:

$$ 2 \le x_1 \le 5 \\ 1 \le x_2 \le 6 \\ \text{since } \ x_2 = y - x_1 \\ 1 \le y - x_1 \le 6 \\ \text{this can be re-written as $x_1$} bounds: \\ y-6 \le x_1 \le y-1 \\ $$

The above inqualities may be combined to give:

$$ max(2, y-6) \le x_1 \le min(5, y-1)$$

So we can re-write $P_Y(y)$ as follows:

$$ P_Y(y) = \begin{cases} P_{X_1 + X_2}(y) = \sum _{max(2, y-6)}^{min(5, y-1)} P_{X_1}(x_1) P_{X_2}(y-x_1) & \text{for $Y \in \{3, 4, \ldots 11\}$}\\ 0 & \text{otherwise} \end{cases} $$

Since these are discrete uniform distributions: $P_{X}(x) = \frac{1}{b-a+1}$

$$ \implies P_{X_1}(x_1) = \frac{1}{4} \ \text{and} \ P_{X_2}(y-x_1) = \frac{1}{6} \\ \therefore P_Y(y) = \sum _{max(2, y-6)}^{min(5, y-1)} \left(\frac{1}{4}\right) \left(\frac{1}{6} \right) \\ \boxed{P_Y(y) = \begin{cases} \left( 1+min(5, y-1)-max(2, y-6) \right) \cdot \left(\frac{1}{24}\right) & \text{for $Y \in \{3, 4, \ldots 11\}$} \\ 0 & \text{otherwise} \end{cases}} $$

We now have an equation for into which we could plug in a given value of y to get the probability mass. A function may be written to mimic the probability mass function as follows:

In [ ]:
def P_Y(y):
    
    """
        Probability mass function for the sum of the two dice.
        Returns the corresponding probability mass for a specified input 
        in the distribution of r.v Y
    """
    
    support = [3,4,5,6,7,8,9,10,11]
    
    if y in support:
        prob_of_y = (1 + min(5, y-1) - max(2, y-6)) * (1/24.0)
    else:
        prob_of_y = 0.0
    
    return prob_of_y

Generating the distribution

The distribution may now be generaated using the function above.

In [353]:
Y = np.array([2, 3,4,5,6,7,8,9,10,11,12])
probs = []
for y in Y:
    probs.append(P_Y(y))
    
assert np.sum(probs) - 1 < 1e-16 # sanity check to ensure probabilites sum to 1

The sanity check is done to make sure the probabilities sum to 1 as expected. For numerical concerns, proximity to 1 is used here.

In [355]:
plt.stem(Y, probs)
plt.xlabel('sum of dice rolls (Y)')
plt.ylabel('probabilities $P_Y(y)$')
plt.title("Probability Mass function of rolls" );

This distribution is in line with a rough idea of what is expected - since the distribution of the sum is a effectively a convolution of two rectangles (uniform distribution), it should ramp up, max out and ramp down.

The numbers are also presented in a table below.

In [417]:
probs_df = pd.DataFrame({'Sum of dice': Y, 'Probabilities': probs}, index=None)

probs_df.style.hide_index()
Out[417]:
Sum of dice Probabilities
2 0.000000
3 0.041667
4 0.083333
5 0.125000
6 0.166667
7 0.166667
8 0.166667
9 0.125000
10 0.083333
11 0.041667
12 0.000000

Q2

Q2a

What’s the probability of winning on the very first roll?

According to the game rules, the probability of winning on the very first roll is the probability of getting a 7 or and 11.

This can be written as:

$$P(Y \in \{7, 11\}) = P_Y(7) + P_Y(11)$$

This can be computed easily by employing the probability mass function defined above in the following code:

In [328]:
p_win_1st_roll = P_Y(7) + P_Y(11)
print(f"Probability of winning in first roll is {p_win_1st_roll:.3f}")  # 5/24
Probability of winning in first roll is 0.208

Q2b

What’s the probability of losing (“crapping out”) on the very first roll?

According to the game rules, the probability of losing on the very frist roll is the probability of rolling a sum of 2, 3 or 12

This can be written as:

$$P(Y \in \{2,3,12\}) = P_Y(2) + P_Y(3) + P_Y(12)$$
In [345]:
p_loss_1st_roll = P_Y(2) + P_Y(3) + P_Y(12)
print(f"Probability of losing on the first roll is {p_loss_1st_roll:.3f}") # = 1/24
Probability of losing on the first roll is 0.042

Q3

Suppose that on the first roll, you do not win or lose, but rather, you get the sum X, which has roll probability p. Given that you have already made it to this point, what’s your chance of winning going forward?

According to the game, if the first roll results in a sum, X, which does not result in a win or loss, the game can now be won only by rolling a sum equal to X again. The game is lost if the roll results in a 7. In other words, to win the game going forward, the sum of the roll must be equal to X before it is equal to 7, otherwise the game is lost.

Concretely, the probability of playing X before playing 7 is sought and this condition may be expressed as follows:

$$ X = \{4,5,6,8,9,10 \} \\ P(\text{roll X before rolling 7}) = \sum _{x \in X} P(X = x) P(\text{rolling x before X = 7}) $$

Considering an instance of X = x, that is, the first roll is $x \in X$:

Let each r.v representing subsequent rolls be expressed as $Y_i$. In order to win we can either roll $x \in X$ in the second roll, or roll a sum that is neither 7 nor x and then roll x, and so on. This game is assumed to proceed as long as a 7 is not rolled:

Note the $P_X(x)$ is same as the mass function previously defined, so the same function may be used.

$$ P(\text{roll x before rolling 7}) = P(Y_1 = x) + P(Y_1 \notin \{x, 7\}) P(Y_2=x) + P(Y_1 \notin \{x, 7\}) P(Y_2 \notin \{x, 7\}) P(Y_3=x) + \ldots $$

This result is a geometric progression and is clearer if $p(x)$ is substituted for $P(Y_i = x)$:

$$ P(\text{roll x before rolling 7} = p(x) + \left(1 - (p(x) + p(7)) \right) p(x) + \left(1 - (p(x) + p(7)) \right)^2 p(x) + \ldots\\ p(7) = P_Y(7) = \frac{1}{6} \\ = \sum _{t=0}^{\infty} \left(\frac{5}{6} - p(x)\right)^t p(x) \ \ \ \text{$t$ is the turn number}\\ = \frac{p(x)}{\frac{1}{6} + p(x)} $$

Putting it all together and computing for all $x \in X$:

$$ P(\text{roll X before rolling 7}) = \sum _{x \in X} p(x) \frac{p(x)}{\frac{1}{6} + p(x)} $$

This is solved in the following code snippet:

In [347]:
X = np.array([4,5,6,8,9,10])

pX_x = np.zeros(X.shape[0])
for i, x in enumerate(X):
    pX_x[i] = P_Y(x)

# computing the weighted sum usig the dot product
p_win_going_fwd = np.dot(pX_x, pX_x / (1/6 + pX_x))

print(f"Probability of winning with two or more rolls (going forward) is {p_win_going_fwd:.3f}")
Probability of winning with two or more rolls (going forward) is 0.329

Q4

If you play the game of craps with these two dice, you will get one dollar if you win, and lose one dollar if you lose, then what is the expected return for playing the game?

By the law of expectation, the expected return for playing the game is:

$$ P(\text{winning}) (1) + P(\text{losing}) (-1) \\ P(\text{winning}) = P(\text{winning in first roll}) \cup P(\text{winning going forward}) $$
In [319]:
p_win = p_win_1st_roll + p_win_going_fwd

p_loss = 1-p_win

exp_return = p_win * (1) + p_loss* (-1)
print(f" The expected return from playing the game is about {exp_return*100:.1f} cents")
 The expected return from playing the game is about 7.5 cents

The returns show that the odds are slightly in favour of the player, provided the player is willing to play for long enough.

In [349]:
print(f' Meanwhile the probability of winning is {p_win:.3f}, while the probability of losing is {p_loss:.3f}')
 Meanwhile the probability of winning is 0.538, while the probability of losing is 0.462

Simulation (S)

Remark: All that has been shown above is an analytical walkthrough of the problem which gives the actual probabilities (barring numerical precision issues). However, another way all of these could have been done is by simulation.

Simulation details: We can simulate several games which would serve as iid (independent and identically distributed) random instances of the result of a game - win or loss. In statistical terms, these will draw from the population of all games which could be used to compute relevant statistics such as the probability of winning. This is what is attempted in the following sections.

Running multiple games could be used to compute a statistic of relevance, however, it is also necessary to provide some measure of uncertainty for such statistics. To achieve this, a distribution is generated for the relevant statistics by performing multiple trials. This idea leverages the central limit theorem which states that distribution of sample means of independent and identically distributed random variables converges to a normal distribution as the number of samples increase.

This framework provides the tools to answer the questions by simulation and this is presented here also partly for verifying the results obtained in the analytical solutions above - both methods should present fairly similar results for fairly large samples.

Preliminaries

First we start by initializing random variables to serve as the two dice. Then the relevant functions are written. This includes,

  • a function that returns the result roll of dice
  • a function that returns the win/loss status of a game. For expediency, the function should also tell the nature of the win; whether it's first roll or not
  • a helper function that returns the result from multiple games
In [248]:
#initialize the random variables r

X1 = st.randint(2,5+1) # +1 because upper bound not included in scipy
X2 = st.randint(1,6+1)

Helper functions

In [405]:
def roll_dice(X1=X1, X2=X2):
    """returns the pair of results from rolling the dice as a tuple"""
    
    return X1.rvs(), X2.rvs()   


def get_game_result(X1, X2):
    
    """returns the result of one game play
    
    Returns:
    
    result (integer): representing corresponding win/loss types according to the following:
    1: "win 1st roll"
    2: "loss 1st roll"
    3: "win after first roll"
    4: "loss after first roll"
    """
    
    x1, x2 = roll_dice(X1, X2)
    roll_sum = x1+x2
    
    # win 1st roll condition
    if roll_sum in [7, 11]:
        game_result = 1
    
    # loss 1st roll condition
    elif roll_sum in [2, 3, 12]:
        game_result = 2
    else: 
        X = roll_sum  # recorded sum
        
        # now keep playing until new roll_sum is equal to X or 7
        x1, x2 = roll_dice(X1, X2)
        roll_sum = x1 + x2
        
        game_on = True
        while roll_sum != 7 and game_on:
            # win after first roll condition
            if roll_sum == X:
                game_result = 3
                game_on = False
            else:
                x1, x2 = roll_dice(X1, X2)
                roll_sum = x1 + x2
                
        # loss after first roll
        game_result = 4
        
    return game_result

    
def simulate_games(n=10000):
    """ returns a list of game results for n games played"""
    results = []
    for i in range(n):
        result = get_game_result(X1, X2)
        results.append(result)
        
    return results 

S-Q1

For each number between 2 and 12, what is the probability of rolling the dice so that they sum to that number?

To answer this question, the dice is rolled multiple times and the results of each roll are added to generate iid random samples. The probability of getting a particular value is estimated as the ratio of the number of occurence of that value to the number of samples.

The value are plotted agianst the proportions to show the sample distribution which approximates the actual distribution.

In [420]:
# run trials
n_trials = 100000

# initialize the space
prob_space = np.arange(0, 12+1)
props = np.zeros_like(prob_space, dtype=np.float16)

# draw random iid dice sums
rolls = [np.sum(roll_dice(X1, X2))  for i in range(n_trials)]

# get results and respective proportions
value, counts = np.unique(rolls, return_counts=True)
mass = counts / np.sum(counts)

props[value] = mass

plt.stem(prob_space[2:], props[2:])
#plt.xlim(0, 12)

plt.xlabel('sum of dice roll')
plt.ylabel('probability mass')
plt.title('sample distribution of the sum of dice roll');

The resulting histogram looks quite similar to the one obtained analytically as expected.

In [422]:
probs_df = pd.DataFrame({'Sum of dice': prob_space[2:], 'Probabilities': props[2:]}, index=None)

probs_df.style.hide_index()
Out[422]:
Sum of dice Probabilities
2 0.000000
3 0.041107
4 0.084351
5 0.124084
6 0.167236
7 0.166504
8 0.166748
9 0.125732
10 0.083069
11 0.041077
12 0.000000

S-Q2

S-Q2a

What’s the probability of winning on the very first roll?

For this result we will compute several trials on the probability of winning on the first roll. Each probability is obtained by playing several games and computing the fraction of games that are won in the first roll. we report the mean of the sample estimate for out statistic.

In [399]:
n_games = 1000
n_trials = 10000


iid_probs_roll1_win = []

for n_trial in range(n_trials):
    
    # get results from several games
    s_results = np.array(simulate_games(n_games))
    
    # win on 1st roll -> 1; compute probability of wthe win
    s_results = np.where(s_results == 1, 1, 0)
    s_prob = np.mean(s_results)
    
    iid_probs_roll1_win.append(s_prob)
    
# compute sample mean for the statistic   
prob_roll1_win_mean_est = np.mean(iid_probs_roll1_win)
print(f"An estimate of the probability of winning in the first roll is {prob_roll1_win_mean_est: .3f}")

plt.hist(iid_probs_roll1_win, bins=50,  weights=np.ones(n_trials)/n_trials);
plt.xlabel('Probabilities of winning in the first roll')
plt.ylabel('probability mass')
plt.title('sample distribution of probabilities of winning in the first roll');
An estimate of the probability of winning in the first roll is  0.208

S-Q2b

What’s the probability of losing (“crapping out”) on the very first roll?

We perform a similar routine as we have done above, but in this case, seek out the games lost on the first roll to compute out probabilities.

In [307]:
n_games = 1000
n_trials = 10000


iid_probs_roll1_loss = []
for n_trial in range(n_trials):
    
    s_results = np.array(simulate_games(n_games))
    
    # loss on 1st roll -> 2
    s_results = np.where(s_results == 2, 1, 0)
    s_prob = np.mean(s_results)
    
    iid_probs_roll1_loss.append(s_prob)

# compute sample mean for the statistic 
prob_roll1_loss_mean_est = np.mean(iid_probs_roll1_loss)
print(f"An estimate of the probability of losing in the first roll is {prob_roll1_loss_mean_est: .3f}")

plt.hist(iid_probs_roll1_loss, bins=50, weights=np.ones(n_trials)/n_trials);

plt.xlabel('Probabilities of losing in the first roll')
plt.ylabel('probability mass')
plt.title('sample distribution of probabilities of losing in the first roll');
An estimate of the probability of losing in the first roll is  0.042

S-Q3

Suppose that on the first roll, you do not win or lose, but rather, you get the sum X, which has roll probability p. Given that you have already made it to this point, what’s your chance of winning going forward?

The probabilities here are computed for wins in roll 2 and beyond.

In [308]:
n_games = 1000
n_trials = 10000


iid_probs_mult_rolls_win = []
for n_trial in range(n_trials):
    
    s_results = np.array(simulate_games(n_games))
    
    # win on subsequent rolls -> 3
    s_results = np.where(s_results == 3, 1, 0)
    s_prob = np.mean(s_results)
    
    iid_probs_mult_rolls_win .append(s_prob)

# compute sample mean for the statistic 
prob_mult_rolls_win_mean_est = np.mean(iid_probs_mult_rolls_win)
print(f"An estimate of the probability of winning going forward is {prob_mult_rolls_win_mean_est: .3f}")

plt.hist(iid_probs_mult_rolls_win , bins=50, weights=np.ones(n_trials)/n_trials);

plt.xlabel('Probabilities of winning going forward')
plt.ylabel('probability mass')
plt.title('sample distribution of probabilities of winning going forward after first roll');
An estimate of the probability of winning going forward is  0.329

S-Q4

If you play the game of craps with these two dice, you will get one dollar if you win, and lose one dollar if you lose, then what is the expected return for playing the game?

For the returns, we simply convert each game win and loss to a dollar gain and loss respectively.

In [309]:
# the expectation is approximated by the average over multiple games
n_games = 1000
n_trials = 10000


iid_returns = []
for n_trial in range(n_trials):
    
    s_results = np.array(simulate_games(n_games))

    # convert game win to dollar values
    s_results = np.where((s_results == 1) | (s_results == 3 ), 1, -1)
    s_return = np.mean(s_results)
    
    iid_returns.append(s_return)

returns_mean_est = np.mean(iid_returns)
print(f"An estimate of the returns on the game is {returns_mean_est: .3f} dollars")

plt.hist(iid_returns, bins=50, weights=np.ones(n_trials)/n_trials);

plt.xlabel('returns from games')
plt.ylabel('probability mass')
plt.title('sample distribution of game returns');
An estimate of the returns on the game is  0.076

Probability of winning the game

This checks the probability of winning the game overall as used in the analytical solution for computing the returns.

In [310]:
# distribution of win probabilities
n_trials = 1000
n_games = 10000


iid_probs = []
for n_trial in range(n_trials):
    
    s_results = np.array(simulate_games(n_games))
    s_results = np.where((s_results == 1) | (s_results == 3 ), 1, 0)
    s_win_prob = np.mean(s_results)
    
    iid_probs.append(s_win_prob)
    
prob_win_mean_est = np.mean(iid_probs)
print(f"An estimate of the probability of winning the game {prob_win_mean_est: .3f}")

    
plt.hist(iid_probs, bins=50, weights=np.ones(n_trials)/n_trials);

plt.xlabel('Probabilities of winning the game')
plt.ylabel('probability mass')
plt.title('sample distribution of probabilities of winning the game');
An estimate of the probability of winning the game  0.538

Final Remarks

Some questions related to the crap game have been tackled analytically and by using simulation tools.

The analytical results present exact results from the underlying probability distribution and laws. This was solved by first deriving a fairly general relationship for the random variable representing the sum of the dice for a roll. Other results are then derived by applying the relevant probability laws and reasonable indepednce assumptions.

The simulation provide another view of the solution to the problem, which is arguably more intuitive. Appropriate uniform random variables are set for the dice which are then used to play multiple games from which the statistic is computed. (Pseudo)Random sample draws are then generated, from which the relevant metric is computed; relying on the central limit theorem.

The results from the simulations match the analytical results closely, thus in some ways, validating the results while presenting another way to look at the problem.